-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add raw batch data wrapper query #77
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the folder structure is a good choice. This seems strictly accounting related so it should go under cowprotocol/accounting.
cast(capped_payment as decimal(38, 0)) as capped_payment, | ||
cast(winning_score as decimal(38, 0)) as winning_score, | ||
cast(reference_score as decimal(38, 0)) as reference_score | ||
from dune.cowprotocol.dataset_batch_data_{{blockchain}}_2024_10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the plan to add these 15 lines manually every month? Can we not come up with a better solution where this is automatically generated based on the current date?
If we go down this route, I'd at least like to see this query re-written in a way the the redundant part becomes just
union all
select * from <new month table>
Otherwise this file will be a horror in 6 months from now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal is that no one touches this query and this is updated automatically by the script that uploads data on Dune.
Indeed, ideally we should have a select *. These were uploaded using dune-sync-v1. If we are able to do proper type casting with dune-sync-v2, this could go away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this is updated automatically by the script that uploads data on Dune.
How would this work specifically? Would the script would make a pull request to this repository and automatically merge a change to the query on the first of each month?
Let's please think this process through to avoid a bad surprise 4 weeks from now when we think we are "done" with this project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok good point about how to sync with this repo. I hadn't thought about this. So what I have in mind is basically what Bram has done when testing some versions of the sync in the Prefect repo: https://github.com/cowprotocol/Prefect/blob/a233d2831e936aa05ff4a7984aa1116580402e11/config/tasks/dune_sync.py#L121
Basically, since the dune-sync job will take a timestamp in order to work, if the timestamp says that it is the first day of the month, it would update the dune query automatically by pushing a change directly to Dune. Which means that the version of the query in this repo would get outdated. Unless we allow the script to also push directly to main, but i am not sure if this is necessary, since again, this query is not to be actively maintained by anyone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative that I tried was to actually prefill the query with all tables for the next few years (!). But it seems that Dune complains, as expected, for non-existent tables. Not sure if there is a workaround to that (we could create dummy tables for the next 48 months for example but not sure if we want that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having this be an automated job makes sense (likely it would still benefit from having the only redundant part be select * from table_x
rather than all the select statements. Therefore this query should not live here I believe.
This PR introduces query 4351957, that aims to provide a wrapper for all raw batch data tables we might upload on Dune, for all chains we are operating in.
It unifies the different tables that are already existent, and also finally introduces the auction id and environment columns for each batch/auction.
The plan is that this query will be updated in an automated fashion whenever a new table (beginning of each month) is introduced.